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1.  Introduction  and  Summary 

In  this  paper,  a  two-sample,  two-stage  non-paranetric  estima¬ 
tion  problem  will  be  studied.  The  parameter  Q  ■  0(F,  G)  under 
consideration  is  estimable  (i.e.,  there  exists  an  unbiased  estima¬ 
tor  4>  =  ♦(X1,...,X„;  Y.,  ...,Y  )  of  Q).  ♦  is  a  function  of  independ- 

ent  observations  from  two  populations  with  cumulative  distribution 
functions  i’(x)  and  G(y).  (Hence,  it  is  called  a  two-sample  prob¬ 
lem.)  The  functions  F(X)  and  G(Y)  will  be  restricted  to  be  members 
of  a  specified  class  D  of  pairs  oi'  cumulative  distribution  func¬ 
tions,  described  in  the  context.  The  total  number  of  observations 
from  the  two  populations  X  and  Y  will  be  a  fixed  number  N.  The 
estimation  procedure  is  carried  out  in  two  stages.  First,  take  M 
observations  from  each  of  the  populations;  then,  allocate  the 
remaining  N  -  2M  observations  to  the  same  populations.  The  method 
of  allocation  utilizes  the  information  from  the  first  stage  obser¬ 
vations  . 

A  two-stage  estimator,  represented  by  U' ,  will  be  introduced. 

It  is  a  U-statistic  with  random  sample  sizes.  (See  [4]  on  general 
U-ctatistics .  U1  is  defined  in  Section  5.)  One  of  the  main  re¬ 
sults  (presented  in  Section  4)  is  that,  under  certain  conditions, 
the  variance  of  U'  approaches  asymptotically  a  particular  variance 
VQ.  This  particular  VQ  (defined  in  Section  2)  is  the  minimized 
asymptotic  variance  of  a  one- stage  estimator  U.  In  other  words, 
it  is  computed  (see  Section  2)  when  the  best  one- stage  allocation 
of  N  observations  to  the  two  populations  is  made  with  the  help  of 
a  partial  or  even  complete  information  about  the  distributions 
F(X)  and  G(Y).  Such  an  information  about  F  and  G  is  represented 
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by  the  "nuisance  parameters"  b^Q  ■  b^Q(F,  G),  b01  •  b01(P'  s)< 
etc.,  defined  in  Section  2.  Thus,  in  particular,  VQ  can  be 
computed  only  when  b^  and  bQ^  are  known.  Moreover,  using  these 
parameters,  it  will  be  shown  in  Section  2  that  is  the  smallest 
among  the  variances  of  all  one- stage  estimators  of  0.  However,  no 
prior  knowledge  of  b^Q  and  bQ1  is  required  to  compute  Var(u‘ ),  and 
it  will  be  proved  in  Section  4  that  Var(U* )/VQ  converges  to  unity 
as  N  approaches  to  infinity. 

A  brief  review  of  some  basic  properties  of  one-stage  U- 
statistics  as  well  as  some  conventions  on  notations  will  be  also 
presented  in  Section  2. 

In  Section  5,  the  "optimal"  choice  of  the  first  stage  sample 
size  M  relative  to  the  fixed  total  sample  size  H  is  discussed. 
Three  cases  with  different  conditions  on  the  unbiased  estimator  <t> 
will  be  considered.  In  each  case,  it  is  found  that  the  "optimal" 
choice  depends  on  the  specific  conditions.  (For  details,  see 
Section  5.) 

Section  6  contains  some  examples.  Here,  to  each  0(F,  G),  the 
corresponding  estimators  for  b1Q  and  bQ1  together  with  their  be¬ 
havior  under  different  conditions  on  F  and  G,  will  be  given.  The 
examples  include  the  cues  that  the  above  described  two- stage 
estimation  procedure  can  be  applied  as  well  as  cases  where  it  can¬ 
not  be  applied. 

Section  7  contains  a  proof  of  the  asymptotic  normality  of  U* . 

In  Section  8,  it  is  Indicated  that  this  two-stage  two-sample 
estimation  procedure  can  be  extended  to  k- sample  two- stage  estima¬ 
tion  with  similar  results  for  k  >  2. 
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In  the  last  section.  Section  9>  another  tvo- stage  two- sample 
estimator  u”  will  he  introduced.  It  is  based  on  the  combined  ob¬ 
servations  of  both  stages  of  total  N  observations,  as  compared  to 
U1,  which  is  based  on  the  second  stage  of  N  -  2M  observations  only, 
u"  is  blued  while  U1  is  unbiased.  Since  U1'  is  of  a  different  na¬ 
ture  as  compared  to  u' ,  the  corresponding  proofs  are  much  involved. 
The  results  on  if  will  be  summarized,  without  proof,  in  this 
section. 

The  technique  of  two- stage  estimation  has  been  discussed  in 
several  papers.  Stein  [llj  has  used  a  two- stage  procedure  to  de¬ 
termine  confidence  intervals  of  a  pre- assigned  length  for  the  mean 
of  a  normal  population  with  unknown  variance.  Putter  [7]  used 
such  a  technique  to  estimate  the  mean  of  a  stratified  normal  popu¬ 
lation.  Robbins  I9]  discussed  a  two-stage  procedure  from  the 
point  of  view  of  the  design  of  experiments.  Later,  Ghurye  and 
Robbins  [3]  used  a  two-stage  technique  to  estimate  the  difference 
between  the  means  of  tvo  normal  populations  (or  some  other  speci¬ 
fied  populations).  Richter  [8]  discussed  the  estimation  of  the 
common  mean  of  two  normal  populations.  The  results  of  the  present 
paper,  then,  are  to  generalize  these  two-stage  procedures  in  two 
ways.  First,  the  underlying  cumulative  distributions  F,  G  are 
members  of  a  larger  class  of  distributions .  Secondly,  the  under¬ 
lying  parameters  0( F,  G)  are  not  restricted  to  population  means 
or  functions  of  means . 

2.  Seme  Basic  Properties  of  One-stage  U-statistlcs  and  Rotations 

Before  formulating  the  problem,  a  short  review  of  some  basic 
properties  of  U-statlstics  la  given  In  this  section,  based  on 
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references  [4,  10] .  For  convenience  of  presentation,  some 
specific  notations  are  adopted  here  as  veil  as  throughout  this 
paper: 

(1)  k  will  be  used  as  a  generic  constant,  vhich  nay  represent 
different  values  according  to  the  context. 

(2)  e*  will  be  used  as  any  small  positive  real  number.  Its 
value  vlll  be  specified  In  various  situations. 

(3)  Vectorial  notations  will  be  used  such  as: 

3tr  =  (X^, . .  .,Xy),  vhere  r  ■  1,  2,  . . . 

’  <Vl'  V 


\  *<\ . V 

*,  ■  (X,  .  X,  )  • 

it,  J  Vl  k 


Here,  the  subscripts  of  the  coordinates  are  a  permutation  of  some 
set  of  Integers,  which  will  be  specified  in  the  context. 

In  order  to  give  a  definition  of  a  two- sample  one- stage  U- 
statistlc,  let  us  consider  two  populations  X  and  Y  with  cumulative 
distribution  functions  F  and  G  respectively.  Also,  let  us  con¬ 
sider  a  real  valued  estimable  parameter  9  *  9( F,  G). 

By  the  statement  that  9  is  estimable,  ve  mean  that  there 
exists  a  function  ♦<V  V  such  that,  with  the  integration  taken 
over  all  values  of  x's  and  Y's, 


(2.1)  e(F,  g)  -/•••/ ♦(**;  •••  »(xr)<*G(Yi)..-<M(Y8). 

Here,  Yfl  are  r  and  s  Independent  observations  from  population 
X  and  Y  respectively.  Moreover,  all  F's  and  G'b  are  restricted  to 
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be  members  of  a  specified  class  b  of  pairs  of  cumulative  distribu¬ 
tion  functions  of  the  populations  X  and  T. 

Without  loss  of  Generality,  the  function  called  the  kernel, 
can  be  assumed  to  be  symmetric  in  its  X  arguments  and  its  Y  argu¬ 
ments  separately.  (See  [4,  10].)  Furthermore,  since  any  function 
of  r  x's  and  s  Y's  can  be  vrittea  as  a  function  of  max(r,  s)  of 
X's  and  Y's,  ve  shall  assume  r  »  a. 

Definition  ({2,  4,  10] ) 

A  U-statistic  associated  with  the  parameter  0  and  the  kernel 
♦,  defined  as  above,  in  a  sample  of  m  observations  on  population  X 

and  n  observations  on  population  Y  for  m,  n  >  r,  is  defined  as: 


(2.2) 


U 


m,n 


U(X 


nr 


V 


i  *(i±  J 


where  the  summation  is  taken  over  all  sets  of  integers  such  that 
1  <  i^  <  . . .  <  ir  <  m;  1  <  <  . . .  <  Jr  <  a  . 

Now,  in  order  to  write  the  variance  of  U  ,  we  define,  for 

m,  n 

c,  d  ■  0,  1,  2,  .  • r, 

(2.3)  ♦*(X  J  Y  )  -  *(X  }  5,  )  -  9 

xr  Jr  *r  "r 

V  ‘  »'<V  V>  h-  5r,l> 


i.e.,  is  the  conditioned  expected  value  of  given  *c  and  y^. 
I.'otc  that  «  0.  Also  define 

(2.4)  b  ,  ==  K[b*  ,(X  ;  ?.)]2  , 

'  '  ccl  cc.  c*  d' 


It  con  be  deduced  [2,  p.  224,  p.  257]  that 
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bc4  *  covC(tlr,  ^),  ♦  itf)J 

where  ( .  ..,ir)  and  (1^, . . .,kr)  are  any  two  sets  of  r  distinct 
integers  from  (1,2,  ...,  m)  and  c  is  the  number  of  integers  com¬ 
mon  to  the  two  sets;  (J^, and  (t^, ...,tr)  are  any  two  sets 
of  r  distinct  integers  from  (l,  2,  . ..,  n)  and  d  is  the  number  of 
integers  common  to  the  two  sets.  Then,  the  variance  of  Um  Q  can 
be  expressed  (2,  p-  277)  as: 


(2.5) 


var(u  ) 
m,n 


„  -1  _  -1  r  r 
(“)  (?)  £  £ 
r  r  c-0  d-0 


cd  • 


Next,  according  to  Fraser  [2],  the  class  D  of  pairs  of  cumula¬ 
tive  distributions,  F(X)  and  G(Y),  for  U  may  be  consisted  of  all 

m,  n 

distributions  uniform  within  intervals.  (For  definition,  see  [2] . 
Particular  examples  are:  a)  a  class  of  pairs  of  absolutely  con¬ 
tinuous  distribution  functions  or  b)  a  class  of  pairs  of  discrete 
distribution  functions . )  Then,  an  important  theory  regarding  the 
variance  of  Um  n  is  also  given  by  Fraser  [2,  Theorem  7*1,  P*  28 
and  Theorem  2.1,  2.2,  p.  142]. 

Fraser1 8  Theorem 

If  the  class  of  pairs  of  distribution  functions  includes  all 
distributions  uniform  within  intervals,  mentioned  above,  then  for 
m,  n  >  r,  U_  is  the  unique  minimum  variance  unbiased  estimator. 

Juj  U 

Rosenblatt  [10]  has  obtained  the  following  Lemmas: 

Rosenblatt's  Lemma  2.4  For  l<c<g<r;  1  <  d  <  h  <  r,  one  has 

(2.6)  gbC0<cbgQ,  hb^<d  b^,  and 
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(2-7)  0  <  to4  <  J  tgi,  .  vh.r, 

lcd  •  •  »co  -  *od  ■  Et,iA>  V  ■  W  ■  ♦*<  V>8' 

Rosenblatt's  Lemma  2.5  Var(u  )  has  the  following  upper  and 
-i"  '  . ■  ■  r  ni>  n 

lover  bounds: 


2  2  4 

r  v  ^  r*  .  r 


(2.8)  Var(U  )  >  i—  b,_  ♦  i—  b_,  +  —  L,, 
v  m,n'  —  m  10  n  01  mn  il 


(2.9)  Var(u  )  <  J  bro  ♦  {  bor  +  ~  l  . 

m,n  -  m  ro  u  or  ran  rr 


In  the  above  discussion,  concerning  U  only,  it  is  assumed 

m,  n 

that  m  and  n  are  fixed  numbers.  Wow,  if  m  and  n  arc  not  fixed  but 
the  total  number  of  observations  on  populations  X  and  Y  are  re¬ 
stricted  to  be  a  fixed  number  M,  i.e.,  m  +  n  =  N,  we  shall  denote 
such  a  two-sample  statistic  by  U  instead  of  Ur  Using  the 
quantities  b^Q,  bQ1  as  defined  in  (2.4),  the  following  statement 
can  be  made  on  the  lower  bound  of  the  variance  of  U. 

Rosenblatt's  Lemma  2.6  If  the  ratio  m/n  satisfies 


0  <  a^  <  m/n  <  <  oo,  asm,  n  — >  oo 


then 

Var(u)  >  (r2/m)b10  +  (r^/njb^  =  v',  say, 


i.e.,  V1  is  the  lower  bound  of  Var(u)  and  V1  is  actually  the 
asymptotic  variance  of  if. 

iiow,  V1  as  defined  above  can  be  minimised  by  selecting  the 
best  values  of  m  and.  n  subject  to  m  ♦  n  »  11,  and  n,  n  >  r.  One 

finds  that  the  best  choices  are 
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(2.10)  Dq  -  N(b10)^/[(b10)^  +  (b01A  -  I*,  Bay,  and 
nQ  =  N  -  mQ  -  N(1  -  Q). 

These  values  for  the  sample  sizes,  represent  the  best  allocation 
of  N  observations  to  the  populations  X  and  Y.  They  depend,  how¬ 
ever,  on  the  unknowns  b1Q  and  *01'  which  represent  a  partial  in¬ 
formation  about  the  distributions  F(x)  and  G(Y)  and  have  been 
assumed  to  be  positive  quantities .  In  other  words,  these  sample 
sizes  can  be  computed  and  the  corresponding  U  statistic  can  be 
constructed  only  when  b^Q  and  b^  are  positive  and  known. 

The  minimum  value  of  V*,  denoted  by  VQ  is  found  to  be 

(2.11)  VQ  -  N’1[r(b10)^  +  r(bQ1)*]2  -  V'Oty  V  • 

It  is  clew  that  VQ  is  at  least  as  small  as  the  variance  of  any 
estimator  of  0  based  on  U-statistics  subject  to  the  restriction 
that  m  +  n  *  N.  Hence,  Vq  is  the  minimized  asymptotic  variance  of 
U,  when  the  best  allocation  of  N  observations  to  populations  X  and 
Y  is  made.  It  will  be  vised  as  a  basis  for  comparison  in  the  re¬ 
maining  sections.  In  particular,  it  will  be  shown  that  there 
exist  two-stage  two-sample  statistics,  say  U1 ,  such  that 
Var(u’ )/VQ  converges  to  unity  as  N  approaches  infinity,  even  though 
no  prior  knowledge  of  b1Q  and  bQ1  is  required  to  compute  Var(u' ). 

3-  Formulation  of  the  Problem;  the  Two-stage  Procedure  and  the 
Estimator 

In  this  section,  a  two-stage  statistic  u'  will  be  defined. 

The  major  result  of  the  investigation  on  U*,  which  will  be  present¬ 
ed  in  Section  4,  is  to  show  that  with  large  samples  and  under 
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cart  tin  conditions  the  variance  of  U1  approaches  VQ  of  equation 
(2.11).  No  prior  knowledge  of  b10  and  bQ^  is  required  to  obtain 
U*. 

Definition  of  U* 

Let  the  total  matter  of  observations  from  populations  X  and  Y 
be  fixed  at  N  where  N  >  6r.  At  the  first  stage,  M  observations  are 
made  on  each  of  the  two  populations,  where  M  >  2r  and  2M  <  N  -  2r. 
From  these  2M  observed  values,  we  shall  estimate  the  parameters 
b1Q,  ^oi*  K  18  ot,8erve4  from  (2*1)  and  (2.4)  that  b1Q  and  bQ1 
are  estimable  functions  (4] .  There  exist  two  associated  U- 
statistics,  called  T^q  and  T^,  which  are  unbiased  estimators  of 
b^g  and  bQ1  respectively.  The  symmetric  kernels  of  these  two 
statistics  are  functions  of  2r  X*s  and  2r  Y's.  Thus  one  can  ex¬ 
press  T1q,  TQ1  as  follows: 


(3.1)  Il0 

(3.2)  I0l 


<2Mr)'1(?r>lll,io(Xl)ia  '  hthj  hj 


<^h)f  -  S(xi2r; 


where  the  summations  are  taken  over  all  sets  of  integers, 
l<i1<  •  •  •  <  i2r  <  Mj  1  <  <  •  • .  <  Jgr  <  M. 

In  analogy  with  (2.10),  we  define 

(3.3)  Z  -  (T10)^  [(Tqi)*  ♦  iVjQlfo'1,  for  T10,  TQ1  positive 

Z  ■  0  otherwise. 

After  T10,  TQ1  and  Z  are  computed,  the  second  stage  is  con¬ 
structed  by  taking  a'  more  observations  on  population  X  and  n' 
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more  observation#  on  population  Y  with  m'  +  u‘  ■  K  -  2M  =  N1, 
where  the  sample  sizes  m*  and  n1  are  determined  as  follows: 


m’  =  [N'Z] 

when 

r/N*  <  Z  <  (K‘  -  r)/N' 

(3-4) 

m*  =  r 

when 

Z  <  r/N' 

a'  -  N'  -  r 

when 

Z  >  (N*  -  r)/N* 

and 

n*  =  N  -  m‘  , 

where  [a]  is  the  largest  integer  contained  in  a. 

With  m'  and  n'  so  defined,  the  statistic  U1  will  be  defined 
as  the  estimator  of  9  (see  equations  (2.1)  and  (2.2))  based  on  m’ 
and  n'  observations  on  populations  X  and  Y  respectively. 

(3-5)  U*  -  (®' )  1(“‘ )  ♦(X1  ;  Yj  ) 

where  the  summation  is  taken  over  all  sets  of  integers, 

M+l<i1<  ...  <ir<M+m’j  M  +  1<JX<  ...  <Jr<M  +  n'. 

In  other  words,  u‘  is  explicitly  a  function  of  the  second  stage 
observations  only.  However,  the  sample  sizes  m1  and  n‘  are  in  turn 
explicit  functions  of  the  first  stage  observations.  Hence,  im¬ 
plicitly,  U1  depends  on  both  stages. 

Finally,  notice  that  the  allocation  of  N*  observations  in 
(3.4)  is  the  same  as  that  of  (2.10)  with  Z  in  place  of  Q.  It  will 
be  shown  in  Lenma  4.2  that  if  M  — >  oo,  then  Z  — >  Q  in  probability. 
Consequently,  the  probability  of  the  first  case  of  (3.4)  occurring 
approaches  unity  and  the  contribution  of  the  other  two  cases  to 
the  variance  of  U1  will  be  negligible,  as  N*  — >  oo .  Thus  one  may 
dispose  of  the  other  two  cases  and  replace  (3-4)  toy  m*  *  N*Z  and 
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n1  ■  N*  (l  -  2).  (Note  that  the  brackets  of  (3.4)  for  m'  and  n* 
will  be  left  out  hereafter,  since  its  contribution  to  the  variance 
of  U1  is  also  negligible,  as  If'  — >  oo . ) 

REMARK:  In  the  two- stage  procedure,  equal  number  of  observations 
on  populations  X  and  Y  are  used  at  the  first  stage.  Intui¬ 
tively,  when  r  ■  8  occurs  in  the  kernel  4  in  a  natural  way 
(i.e.,  no  argument  of  4  is  identically  zero),  and  there  is  no 
information  about  the  relative  sites  of  b10  and  bQ1,  equal 
size  samples  seem  appropriate  to  the  symmetry  of  the  situa¬ 
tion.  When  r  j*  s,  but  one  writes  4  as  a  function  of  max(r,  s), 
one  night  doubt  the  appropriateness  of  the  equal  sample  sizes 
at  the  first  stage. 

4.  Asymptotic  Efficiency  of  the  Estimator 

It  is  mentioned  in  Section  3  that  if  H  — ■>  eo  with  M  — >  oo 
and  H*  — >  oo,  then  the  second  stage  sample  sizes  (3-4)  can  be 
replaced  by; 

(4.1)  m*  =  N'Z  ;  n’  -  N'(l  -  2) 

where  b^Q  and  bQ1  are  assumed  to  be  positive. 

In  this  section,  it  will  be  shown  that  under  certain  condi¬ 
tions,  the  ratio  between  the  variance  of  U1  (defined  in  Section  3) 
and  VQ  will  asymptotically  approach  unity.  (Recall  that  VQ  is  the 
smallest  of  the  variances  of  any  one-stage  U-statistic  estimator 
of  9  subject  te  the  restriction  that  m  +  n  =  M-  Vq  can  be  computed 
only  when  b1Q  and  bQ1  are  known  and  the  best  one-stacc  allocation 
of  i:  observations  to  populations  X  and  Y  are  made . )  The  proofs 
are  presented  in  Theorem  4.1.  First,  U1  is  shown  to  be  unbiased. 
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Then,  in  Theorem  4.1,  N  Var(U* )  Is  partitioned  into  tvo  parts, 
namely,  when  \Z  -  Q|  is  less  than  m“p  for  any  p  within  the  range 
0  <  p  <  ^;  and  when  |Z  -  Q|  is  greater  than  M"P.  (Recall  that  Q 
gives  the  best  allocation  of  N  observations  and  the  basis  for 
evaluating  VQ,  see  (2.10)).  By  the  results  of  Lemmas  4.1  and 
4.2,  it  is  concluded  that  the  second  part  is  of  the  order  of  mag¬ 
nitude  of  0(M*2+i|p  N).  The  first  part  of  N  Var(u‘ )  is  shown  to  be 
of  the  order  of  magnitude  of  {rfb^)^  +  r(b^)^)2  +  0(MN-*)  +  0(M  p). 
The  first  term  of  this  expression  is  equal  to  N  VQ.  Now,  under 
certain  assumptions  (see  Theorem  4.1  below)  concerning  the  relative 
order  of  magnitude  of  M  and  N  and  0  <  p  <  ^.  It  will  be  shown 
that  0(M’2+^P  N),  0(m/n)  and  0(m'p)  converge  to  zero  as  N  approaches 
infinity.  Hence  the  ratio  Var(u* )/Vq  converges  to  unity,  which  is 
the  result  of  Theorem  4.1. 

In  Section  5,  it  will  be  shown  that  the  best  choice  of  M 
(under  the  assumptions  of  Theorem  4.1)  is  equal  to  KN^^,  where  K 
is  a  non-zero  unknown  constant.  The  resulting  value  of  p  is  1/6. 

Thus  the  ratio  of  Var(u' )  to  VQ  is  equal  to  1  ♦  0(N-1^). 

Lemma  4.1  let  0(F,  0)  *  6  be  an  estimable  parameter  with 
symmetric  kernel  S  »  S( ;  Yr).  Let  W  *  be  the  associated  U- 
statlstlc  with  N  observations  on  populations  X  and  Y  with  cumula¬ 
tive  distribution  functions  F(X)  and  Q(Y)  respectively.  Assume 
that  the  2ith  moment  of  the  kernel  is  finite.  Defines 
e)  W*  ■  W  -  0,  b)  8*  ■  S  -  ®  and  c)  Srt^rt)«s;, 

then  for  any  positive  integer  1, 
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E(W*  )21  -  lB(S*  f)1  *  oiH1) 

i'.2  TT 

«  0(M_1)  . 


Proof:  Por  convenience,  again  let  r  ■  b.  Also  define: 
k-1 


W" 


E  *  8  ^rt+r.rt'  ^rt+r.rt^ 


t-0 


l  k'1 

»  r  £  Sf  ,  where  k  ■  M/r  . 
k  t-0  1 

W"  le  on  overage  of  k  independent  and  identically  distributed 
random  variables  with  mean  zero.  From  the  work  of  Tchouproff  [12], 
one  has 

E(v")2i  -  4j  i(V  s;)2i 

k21  t-0  t 

- -|r  [iSiii  k(k  -  1)  ...  (k  -  i  ♦  1)E(S')2...E(S’)2 
k2i  t\2x 

-  [varO1 J]1  ♦  OCk'1’1) 
i'.2  k 

-  0(M-i)  . 

We  now  prove  that  W1  con  be  written  in  terms  of  W11  as  follows: 

(4.2)  W*  -  (Ml)'2  EW"(^j  Y^)  , 

where  the  summation  is  token  over  all  permutations  of  (h^,  ...,1^), 
(J.^, ...,  <3m)  of  (1,  2,  ...,  M).  Starting  with  the  right  side  of 

(4.2) , 


Ilf 


(m!)'2  £  w"  -  (m*.)“ki  £  £  s'(\  ;  Yj  ) 


^2*  1 


k-i 


t-0  “rt+r, rt  ^rt+r,rt 

;  y<  ) 


1  k’1  -2 

C  £  (m!)  es'CJL  ,  * 

*  t-0  ‘ rt+r,rt  Jrt+r,rt 


l  k-l  v  '2 

£  (?)  2’S*(3L 

t-0  rt+r,rt  Jrt+t)rt 


) 


where  £*  is  taken  over  all  sets  of  integers, 

1  <  \t+l  <  -  <  hrtT  5  M;  1  <  Jrt+1  <  <  W  <  M  > 

for  all  t  =  0,  1,  k-1  . 

Then, 

o  ,  k-1 

(M!  )'2£  W"  -  i  £  W'  -  W»  . 

K  t-0 

Next,  since 

(V)21  -  [(M’.)"2£  W"]21  <  (W")2i  , 

one  has 

E(W*  )21  <  E(w")21  -  -igj-4  IVarCS1)1  +  oCk'1'1)] 
i'.2k 


=  OCk"1)  -  0(M_i)  . 

The  lemma  is  proved. 

Lemma  4.2  Let  Z  and  Q  be  defined  as  in  (3.5)  and  (2.10) 
respectively.  Assume  for  0  <  p  <  ,  i  is  an  integer,  i  >  2, 

that  ♦  has  lfitil  finite  moments.  Then 


Pr[|Z  -  Q|  >M'P)  -  0(M’i42ip)  . 
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Proof:  Write 

Pr[|Z  -  Q|  >  M~P]  -  Prl|Z  -  Q|  >  M"pj  T1q,  Tq1  >  0] 

+  Prl|Z  -  Q|  >  M_p;  TiQ,  Tq1  not  both  positive] 

C  Pr[ - - -  >  Q  +  M-p]  +  Pr f - - T  <  0  -  M*P] 

“  +  (T_)S  t(T01)*  +  (O* 


•(  V  +  <V‘ 

+  Pr[T10  <  0]  +  Pr[T01  <  0] 


"10' 


Write, 

Prl 


<Tia)i—  >  (,  ♦  m-p] 


(T01)*  +  (T,”)* 


10' 


Prl(Tlo)^  >  &  +  M"P)(Tin)^  +  (Q  +  M'P)(Tm  A 


10' 


"01' 


''f1''10  '  (!  -  m-'’  To1'  '  11,10  ‘  (i M-p>  1)011 


■It,,  •  <  ^  *  M~P .  } 


'10  '  'l  -  «  -  M-f>  i:1’ 


One  notices  that 


it  .  sjlm£_)2t  i  .  [i  .  /  s uH  \\  i 

10  1  -  0  -  *-»'  01  10  1-0-  M-p  01 


is  a  U-statistic  with  mean  zero  and  its  kernel  is 

[h  .  (  AJjg-)8,)  -  U  -  (_3JJLL.)2b  ]  , 

1-Q-MP  T_A_uP 


1-0 


where  g,  h  are  defined  in  (5.1),  (J.2)  respectively.  Since  t>10 
and  bQ1  are  assumed  positive,  one  has  0  <  Q  <  1.  Also,  for  M 
large,  one  can  choose  M~p  <  min(Q,  1  -  Q).  Thus 


1 6 


•lb-  -  <r^tS)2  w 

"  '^bio  '  ^1  -  ^  'L  +  5tT  -  ftj  +  °'v'  P^boi^ 
■  ‘  bw  4  bio  li  *  5TT^57  *  o("'P>1 

-  5rr-~w  *  °(M  >  • 

Hence  the  last  quantity  is  positive  for  small  M~p. 
Using  Tchehychef f 1 s  inequality  of  the  form, 

Pr[ |x|  >  a]  <  a"21  E(x)2i,  one  has 


Pr 


(lTi°  - 


)2t  ]  _  [b  -  ^ 

-p'  i01J  1  10  11 


Q  -  M 


+  M~P  v2  , 

- Pp)  v 


1  -  Q  -  M 


.  rv  ,  Q  +  M~P  x2.  , 

>  “ib10  -  ( - —)  bnJ 


1  -  Q  -  M 


•p'  01 


} 


{[T10  •  (  **  *  -  lb,„  -  ( 


+  M-p  w 


Q  -  M 


,-P  01' 


10  1  -  Q  -  M~P  01 


r 


<  -  >  -  a  -»  2i 

WP  b1Q  [Q(l  -  Q)]"1  +  o(M_1) J 

by  Lemma  4.1,  and  is  equal  to  0(M_i+2ip)  for  M  large. 
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Analogously,  one  gets, 

A 


r  (V  -pi 

{(T0l)*  *  (I10)*  <  “  ”  J 

*  {[(ratS>2  ^ 


f/  ft  -  m'p  >g.  h  ] 
'l  -  «  ♦'  M-f  01  "  10 


iff  r‘  M'1(Cv“  ■ h)) 


[2M-P  tQ1  [Q(l  -  Q)]"1  +  o(M_P)] 

=  0(M-14S1P)  . 

Similarly,  losing  Lemma  4.1, 

Pr(T10  <  0)  -  Pr(l10  -  b10  <  -b10)  <  Pr(|T10  -  bj  >  b10) 


K<Tio  -  V 


21 


21 


10 


=  0(M’i),  and 


Pr(T01  <  0)  *  0(M_i). 

Therefore,  Pr(|Z  -  Q|  >  m"P)  =  0(M_i42ip)  +  0(M_i)  =  0(M"i4fiip). 
Lemma  4.2  is  proved. 

Theorem  4.1  U*  1b  an  unbiased  estimator  of  0,  i.e.. 


E(U' )  =  9.  Also,  if 
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/.  \  Limit  JJ 
N  — >  oo 


exists  and  is  finite  for  some  such 


that  1  <  0  <  2 . 


(li)  the  eighth  moment  of  *  is  finite,  and 
(iii)  b1Q,  bQ^  are  both  positive, 

then 


Limit  Var(u' )  _  Limit  )] 

N  — >  oo  VQ  N  — >  oo  VQ  "  x  1 

REMARK:  In  most  non- parametric  problems,  the  kernel  ♦  is  bounded, 

hence  all  moments  exist.  Therefore,  the  restriction  (ii)  is 

not  severe.  Var(u' t)  denotes  the  conditional  variance  of  U' 
m 

given  m*  and  n1,  and  Var(U* )  denotes  the  expected  value  of 
Var(U^i),  where  the  expectation  is  over  m'  and  n',  or  Var(U* ) 
is  the  unconditional  variance  of  U* . 

Proof:  Notice  that  m* ,n‘  are  defined  to  be  greater  than  r, 

only.  On 

the  other  hand,  sill  the  arguments  of  $(X.  ;  Y.  )  in  the  definition 

r  ^r 

of  U*  (see  (3-5))  are  functions  of  X^,  ...,  X^,;  Y^+1,  ..., 
YM+nt  .  Thus  the  arguments  of  U*  are  independent  of  X^,  ...,  X^ ; 
Yi,  ...,  Therefore, 


and  that  they  are  functions  of  X^, . . .,  5^;  Y^, . . . , 


E(U‘)  =  Em,E(U^,) 


,m‘  r1,n'  -1 


V?>  <r> 


_«  -1  «  -1 


»  e  . 


Hence,  U*  is  unbiased. 
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Now,  let  c  *  condition  that  |&  -  Q|  <  M*p 
C1  =*  complement  of  C, 

0  <  -g  <  (see  Lemma  4.2  for  1  »  2) 

tf  Var( U‘ )  -  *  E  .(Var(uM) 

in  m 

-  M  Pr(|Z  -  Q|  <  M’p)  Ba,ec  Var(U^t) 

+  II  Pr(|Z  -  Q|  >  K*p)  Kn,4Ci  Vor(U^i)  . 

Using  the  fact  that  Em>cci  Var(U^r)  <  Var(u'.r)  ■  brr  *>  Var(<»)> 
which  is  bounded,  by  assumption  (ii),  and 

Pr(|Z  -  Q)  >  M"P)  =  0(M*2+4p),  0  <  p  <  J 

by  Lemma  4.2,  one  obtains 

N  Var(Uf )  <  H  E  ,  Var(U't)  +  2+4p)  . 

iu  tc  in 

It  is  easy  to  show  that  there  exists  a  number  A  which  is  independ¬ 
ent  of  m1 ,  n’,  such  that 

Var(U^, )  <  (rS^)/*'  ♦  (rS^J/n’  ♦  A/*in(m’2,n'2). 

The  procedure  is  to  expand  the  terms  of  Var(U^t)  and  its  combina- 

torials  and  then  to  substitute  a  fraction  by  unity.  Consequently, 

one  finds  that  the  first  two  terms  are  less  than  or  equal  to 

(r2b10)/m'  and  (r^bg^J/n'  respectively,  For  the  rest  of  the  rxr 

terns,  we  substitute  again  a  certain  fraction  by  unity  and  find 

2  2 

that  each  term  has  denominator  less  than  or  equal  to  min(a’  ,•'*). 
Hence  we  find  that  A  may  be  taken  as  the  sum  of  all  the  rxr  values 
in  their  numerators,  which  arc  composed  of  r*o  and  b^'s,  c,d  - 
1,  2,  •  •  • ,  r . 


One  has 
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2  2 

E  .  „  Var(U'.)  <  E  ■  j£r  b  n  +  Jrb  + - * — —)  . 

n  ec  a  “  n  ec  m  iu  n  oi  nin(al<~  n'^) 

Also,  when  |Z  -  Q|  <  M"p  and  II  — >  oo,  m',  n‘  can  he  written  as 
m’  >  N'(Q  -  M‘p),  n*  >  H*(l  -  Q  -  M"p).  Thu*, 

,  Nr2b  llr\ 

N  Var(U  )  < - i2 - -  + - 

“  H*(Q  -  M'P)  W'(l  -  Q  -  M_P) 


_ M_ 

-1  /  _  u“P  ’W'1  ( 


-  M~p)/i'(l  -  Q  -  M'P))  > 
(N/H*  )[r(b1Q)^  +  r(b01)"2']2[l  +  2M_P  +  o(M_p)] 


+  O(M'2^p40) 


+  0(m''2)  +  O(M_2+4p40) 

-  tl  +  2MN"1  +  odm"1)Hr(bl0)i  +  r(bQ1A2 


•  II  +  2M_P  +  o(M_P)]  +  0(HNf'2)  +  0(M"2+1*P‘*) 


-  tr(b1Q)^  +  r(b01)^]2[l  +  0(N(l_P)//p)  +  0(N_p/p) 

♦  0(N<-2+4p^)], 

after  putting  M  =  K  where  K  is  an  unknown  non- zero  constant. 

Since,  by  assumption  (i),  1  <  0  <  2,  there  exists  p,  so  that 
0  <  p  <  £  ,  and  (-2+4p+0)  <  0.  Finally, 
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Limit 
N  — >  oo 


Var(  U1 )  „  Limit  V 


T 


N  — >  oo 


Limit 
N  — >  oo 


»  V  v«<o;,) 

— rc - 


[r(b10)i  .  r(b01)i]2 


Limit  _ 

‘N~>0°  [r(b1Q)*  +  r(b01)^]2 

•  [1  +  0(N^1’P^P)  +  0(H_P^) 

+  0(N(  -a+MO/ej 


=  i  . 


Hence,  the  theorem  is  proved. 

In  addition  to  U:,  other  two- stags  estimators  of  0  can  he 
defined.  For  example,  if  0  is  estimated  separately  at  both  stages, 
then  one  can  combine  these  two  estimates  by  weights .  This  paper 
will  not  include  any  explicit  discussion  of  such  estimators.  On 
the  other  hand,  the  following  one- stage  statistic  will  be  dis¬ 
cussed. 

Assume  that  N  observations  are  to  be  made,  and  that  the  b^'s 
are  unknown,  (except  that  *10'  boi  are  positive),  then  proceed  as 
if  b10  =  bQj .  The  variance  of  a  one- stage  U-statistic  is  minimized 
with  respect  to  m,  subject  to  m  +  n  =  H,  when 
m  ■  N/2,  n  ■  N/2  . 

Let  the  statistic  be  denoted  by  U  ,  then  its  variance  is  given  by 

Var(U*)  «  N'1  2r2(b1Q  bQ1)  ♦  0(n'2)  . 

Hence, 
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(4.3) 


Limit  Var( U* )  Limit  N  Var(U*) 

N  — >  oo  VQ  N  — >  oo  N  V0 


Limit 
N  — >  oo 


grg<b10  *  b01> 

lr(t10)4  ♦  r(b01)*]g 


+  oCn*1) 


-  2d  4  p2) 

(1  +  p)8 

where 

p  • ( w*  • 

When  p  approaches  0  or  oo,  (4.3)  approaches  its  maximum  2.  Thus, 
comparing  the  results  of  Theorem  4.1  with  (4.3),  an  appreciable 
decrease  in  variance  can  be  obtained  by  using  a  two- stage  proce¬ 
dure. 

REMARK;  For  s  /  r,  if  we  write  ♦  as  a  function  of  max(r,  s)  X's 

and  y’s,  the  choice  of  m,  n  shall  be  Nr/(s  +  r)  and  Ns/(s  +  r) 

* 

respectively  in  order  to  minimize  the  variance  of  U  assuming 
b1Q  *  bQ1.  A  simple  computation  shows  that  the  variance 
ratio  approaches  1  +  s/r  as  p  approaches  zero  and  approaches 
1  +  r/s  as  p  approaches  infinity.  Thus  the  variance  ratio  may 
have  a  maximum,  for  r  j  s,  greater  than  2. 

5-  "Optimal"  Choice  of  the  Value  M  Relative  to  M 

"Optimal"  choice  of  the  value  M  (sample  size  of  the  first 
stage)  relative  to  N  (total  sample  size)  will  be  studied  in  this 
section  for  the  following  three  cases: 


2<”io  ♦  V 
«V*  ♦  <V*’2 
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a)  The  first  eight  amenta  of  the  kernel  ♦  exist 

For  this  case,  ve  proceed  as  follovs .  From  the  last  step  of 
the  proof  of  Theorem  4.1,  one  has 

Var(U(  )/V0  -  1  +  0(M'^_1b  +  0(m'P)  +  0(M_2+1;p^)  . 

A  heuristic  method  for  finding  the  best  3  and  p  is  to  find  the 
solution  of  the  pair  of  equations  listed  below,  which  are  obtained 
by  examining  the  exponentials  in  the  remainder  terms  of  the  above 
equation. 

(5,1)  P  -  1  -  P 

(5-2)  p  =  2  -  4p  -  p 

and  get  3  =  7/6,  p  =  1/6,  thus  M  =  K(N^'7). 

Actually,  this  pair  of  values  is  the  ,,optinlal,,  solution,  be¬ 
cause  it  is  easy  to  see  that  any  other  choice  will  make  one  of  the 
three  terms  have  a  larger  order  of  magnitude  than  0(M-1^)  (or 
equivalently,  0(N-1/7)).  Therefore,  Var(u‘ )/VQ  =  1  +  0(n"1//7). 

b)  All  moments  of  the  kernel  0  exist 

By  Lemma  4.2  and  Theorem  4.1,  for  general  i,  i  >  2, 

0  <  p  <  (i  -  l)/2i,  one  has 

Var(u' )/VQ  =  1  +  0(M^P-1))  +  0(M"P)  +  O(M'i+2ip'f0)  . 

Similar  to  the  above  case  (for  i  =  2),  one  solves  the  two  equa¬ 
tions  ; 

(5-3)  P  -  1  =  P 
(5-4)  p  -  i  -  2iP  -  3  • 
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It  is  found  that  p  ■  (3i  +  l)/2(l  +  i)  and 
P  ■  (i  -  l)/2(l  +  i)  is  the  set  of  solutions.  When  i  approaches 
infinity,  p  approaches  3/2  and  p  approaches  1/2.  Therefore, 

M  -  K(N2(l+i)/(3i+1)),  where  2(i  ♦  l)/(3i  +  l)  has  2/3  as  a  lower 
bound.  This  bound,  however,  is  not  obtained.  Thus  when  ♦  has  all 
finite  moments, 

Var(u‘ )A0  “  1  +  0(N-1‘t£^1+i^3i+1^)  for  any  i. 
c)  The  kernel  ♦  is  bounded 

First,  it  will  be  shown  by  the  following  Lemma  5.1  that 
»2 

Pr(|Z  -  Q|  >  e* )  <  0(e  e  M),  where  e  is  the  base  of  natural 

logarithm  and  e*  is  some  small  number.  Consequently,  an  "optimal" 

choice  of  M  can  be  obtained  in  an  implicit  form. 

Lemma  5.1  (Hoeffding's  inequality,  see  [5]).  Let  U  be  a 

'  n,  n 

U- statistic  with  n  observations  on  any  two  populations  X  and  Y  for 
estimating  some  parameter  9.  The  kernel  of  9  is  S(Xr;  Yg), 
a  <  S  <  b.  Then  for  any  positive  number  e’, 

-2e,2[ - 3 - A 

Pr(Un  -  e  >  .•)  <  exp  ( - SlZIi) 

n^n  (b-a)2 

for  n  large,  =  exp  (-0(e,2n)) 

_i2 

*  0(e"e  n)  . 

How,  from  the  proof  of  Lemma  4.2,  and  neglecting  the  smaller 


order  term 
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Fr(|Z  -  Q|  >  e1)  -  Pr(Z  -  ft  >  e1 )  +  Pr(Z  -  ft  <  •') 

=  Pr  -  (x  ?  q  .  e>)  t013  ’  lb10  -  (x  ?  ^  .  el  ^  boJ 

>  ‘Ib10  *  (1  -V-eT)  V) 

+  ^  {^1  -  Q  +  e“)  T01  ‘  T10^  '  ^1  -  ft  +  V  ^  b01  "  b10^ 

>  "^1  -  ft  +  eJ  ^  b01  "  b10^  }  * 


Applying  Lemma  5.1  and  assuming  that  r  «  s  and 

2  2 

Ki  <  h  -  (x4  $  ”t)  6  <  ^  ^  <  (rH^T)  8  ’  h  -  h  * 


Pr(|z  -  ft|  >  e‘)  <  exp 


-  (l 


<*  +  «' 


_a^. 


ir)  V 


(*2  -  T 


M 

r 


+  exp 


-2i(r? 


ft  -  e» 
ftte‘^01 


(*3  -  Kkr 


Hence,  one  has 

YPfi-  )  m  l  +  o(MN_1)  +  O(e’)  +  0(e'e,2M)  . 
v0 

Using  a  similar  approach  as  before,  i.e.,  requiring  the  three  terms 
to  have  the  same  order  of  magnitude,  one  has, 

(5.5)  MN"1  -  e' 

(5-6)  log  e'  -  -e,2M  . 
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Substitute  (5*5)  into  (5-6), 

log  e'  -  -e,3N,  hence 
(5-7)  N  -  -log  e’/(e* J3 
(5*8)  M  -  -log  e'/(e' )2  . 

From  (5*7)<  (5-8),  M^N-2  =  -log  e*  =  log(e')’1.  Therefore, 

(5-9)  M  -  N2/3  log(e* )"ly3  . 


Taking  logarithm  on  (5-7), 

log  N  ■  log  [log(e')-1]  -  log(e')3 

■  log  [log(e’ )-1)  +  3  log(e* )_1  . 
It  is  seen  that  for  e1  small. 


(3  +  A)  log(e* )_1  >  log  N  >  3  log(e* J’1  . 

Substituting  the  inequalities  into  (5*7)  end  (5*9)  respectively, 
one  has 


M  <  H2/3  [1/3  log  N]1/3  -  N^llog  W3]1/3 
(5-1°)  lA 

M  >  M2/3  [l/(3«A)  log  N]1/3  =  N^llog  N1^3"^)] 

By  (5*5)>  m  ■  Ne'  ,  one  has 

N’^tlog  N1^3^]  ^  <  e*  <  K'^llog  K1/3!  ^  . 
Therefore,  in  the  case  with  ♦  bounded, 


Var(u')/V0  -  1  ♦  0(N-1/3  I) 


where  I  is  some  value  between 
(log  ^)l/}  ■ 


(log  4))1/5 


and 


6.  Some  Examples 

6.1  Consider  the  Vilcoxon  Statistic.  The  class  D  contains  all 
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pairs  of  cumulative  distribution  functions  F,  G  which  are  con¬ 
tinuous  . 

6  *  Pr(X  >  Y)  with  the  kernel 
f(Xi,  Yj )  =  1  if  X1  >  Yj 
«=  0  otherwise  . 

In  this  case,  r  =  s  =  1.  The  nuisance  parameter  b01  and 
bll  are 

b10  *  **1*1  >  Yi>  Y2>  ‘  lPr(Xl  >  Yl)]2 

boi  3  h>Yi')  '  tPr(xi  >  Yi)]2 

bu  =  Pr^  >  Yx)  -  lPr(X1  >  Y1)J2  . 


It  can  be  shown  that 


b1Q  *  2  Pr(x1  >Y1>Y2>X2) 
bQ1  -  2  Pr(Yx  >  Xx  >  Xg  >  Y2) 
bu  -  >  Yi  >  Xg  >  Ya)  +  Pr(Y1  >  >  Y2  >  x2) 

♦  2  Fr(X1  >  \  >  Yg  >  Xg)  +  2  Pr^  >  ^  >  Xg  >  Yg)  . 

The  estimators  of  b1Q,  bQ^  are  respectively. 


"10 


<  ig  <M 


L  2h(X  j  Y.  ) 

1  <  Jj,  <  i2  -  M  2  2 


l01 


i2  <M 


E  2g(X  ;  J  ) 

1  <  <  J2  <  M  *2  °2 
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where  h(i.  ;  5,  )  ■  l/k  it  the  two  Y’s  are  ranked  between  the 
i2  J2 

two  X*s. 

■  0  otherwise. 

g(X.  ;  Y.  )  ■  l/b  it  the  two  x's  are  ranked  between  the 
i2  «2 

two  Y*s  . 

■  0  otherwise. 

Here  ♦,  g,  h  are  all  bounded.  When  9  is  neither  zero  nor  unity, 
only  one  of  the  b^,  bQ1  can  be  zero.  Moreover,  it  can  be  shown 
that  when  bM  *■  0  (bQ1  =  0),  bQ1  -  b n  -  e  -  ®2  (*10  =  bn  = 

O 

8  -  9  ).  If  it  is  assumed  that  F,  G  are  both  strictly  monotone, 
then  both  b10  and  bQ1  are  positive.  Therefore  when  F,  G  are  both 
strictly  monotone,  the  two-stage  procedure  is  applicable.  Since 
in  this  case  ♦  is  bounded,  one  shall  choose  M  between 
N^Ilog  n1/^^)]1/3  and  N^llog  N1/3]1/5. 

6.2  Assume  9  »  E(X)  -  E(Y),  where  independent  observations  on 
populations  X  and  Y  are  made  with  cumulative  distribution  func¬ 
tions  F  and  G  respectively.  The  class  D  contains  all  cumulative 
distribution  functions  with  finite  expectations.  Then  9  is 
estimable.  The  kernel  is  $  =  Xi  -  Y.^  and  again  r  *  s  =  1.  In 
this  case,  and  bQ1  are  the  population  variance  if  they  exist. 
The  kernel  of  b1Q,  bQ1  are  ^(X±  -  Xj f,  ^(Y±  -  Y^)2,  i  <  j, 
respectively.  The  corresponding  U- statistic  for  estimating  b1Q, 
bQ1  are  the  sample  variances,  which  can  be  expressed  in  the  follow¬ 


ing  form: 
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T10  ■  ^  l  J  ■  */  *  4 

Toi  -  |  }  ■  4  ' 

In  this  case,  the  kernels  are  not  hounded,  unless  the  distri¬ 
butions  of  X  and  Y  are  bounded.  b1Q  (bQ^)  is  positive  if  popula¬ 
tion  X  (Y)  is  not  a  constant  with  probability  one.  To  apply  the 
theorems  of  this  paper,  the  distributions  of  X  and  Y  must  have 
finite  eighth  moments.  One  may  choose,  say,  M  * 

If  D  contains  normal  distribution  functions  only,  Ghurye  and 
Robbins  have  given  exact  results  for  small  samples  [3] . 

6-3  An  example  where  the  theorems  of  this  paper  do  not  apply. 

Let  the  parameter  be  9  =  lE(X)]2  -  [E(Y)]2,  and  let  F,  G  belong  to 
any  class  D  such  that  populations  X  and  Y  have  zero  mean  and  all 
finite  moments.  Now  the  corresponding  symmetric  kernel  for  esti¬ 
mating  6  will  be  9  =  X^j  -  Yj .  Then  the  kernels  for  b^,  bQ1 
and  b^  are  of  the  following  forms  respectively: 

(xLx2  -  Y^Hx^  -  y3y4) 

(jyfe  -  Y1Y2)(X5X4  -  Y2Yj 

(X^Xg  -  y1y2)(x1x5  -  yxy3)  . 

Since  it  can  be  shown  that  each  of  theee  has  zero  expected 

value,  one  cannot  use  any  of  the  results  of  this  paper.  However, 
the  theory  of  U-statlstic  is  applicable  and  one  needs  the  kernels 
for  bgQ  and  b^,  which  are  given  respectively  by: 


(X^Xg  -  y1y2)(x1x2  -  YjY4) 

(XjXg  -  Y1Y2)(XjXu  -  Y^g)  . 

Then  the  expected  values  of  these  kernels  are: 

E(X^  X®)  -  [Var(X)]2  >  0 

E(tf[  Y2)  -  IVar(Y)]£  >  0  ,  respectively. 

Special  attention  should  also  be  paid  to  the  fact  that  in 
this  case,  the  associated  ^-statistic  may  not  be  asymptotically 
normally  distributed,  see  [10] . 

7.  The  Asymptotic  Distribution  of  U* 

In  this  section,  it  will  be  shown  that  U*  is  asymptotically 
normally  distributed.  Let  us  consider  first,  two  random  variables 
Y1  and  Y*  defined  as  the  following: 

Y*  -  (U‘  -  e)/(Eml[Var(l£,)])^ 

Y  =  Vq,n,(i-q)"0/*v^,V<i,n,(i-q)^  • 

It  has  been  proved  by  Rosenblatt  [10,  Theorem  2.2]  that  Y*  is 
asymptotically  normal  with  mean  zero  and  variance  one.  In  what 
follows,  Theorem  7.1  shows  that  Y1  is  asymptotically  equivalent 
to  Y  ,  thus  also  asymptotically  normally  distributed  with  mean  zero 
and  variance  one. 

Theorem  7.1  Y1  and  Y*  are  asymptotically  equivalent,  i.e.. 
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Proof:  In  order  to  show  -chat-  Y1  and  Y*  are  asymptotically 
equivalent.  It  suffices  to  show  that 

E(Y*  -  Y*)2  — >  0  as  N'  — >  oo . 

Now,  E(y'  -  Y*)2  =  E(Y*)2  +  E(Y*)2  -  2E(Y'Y*)  . 

From  Theorem  4.1,  U‘  is  an  unbiased  estimator  of  0  and  Y*  is  its 
normalized  form.  Hence  E(y‘ )  ■  1.  By  assumption,  E(Y  )  3  1. 

Also,  by  Theorem  4.1, 

Em,var(U^)  =  (1/N')[r(b10)£  +  r(bQ1)i]2  +  o(N,_;L) 

and 

Var(UN,QiNt(l_Q))  =  (r2b10)/(N‘Q)  +  (r2b01)/(N'(l-Q))  +  O^''1) 

by  Rosenblatt  [10,  Lemma  2.6].  Therefore, 

(7.1)  E(y’Y*)  3  E[(U‘  -  )-©)]  • 

-  “'MV*  *  r(i.01)ir2E[(u'  - 

Now  let  U'  be  the  statistic  U!  with  the  kernel  <t> 1  (X  ;  Y  ),  and 

x 

&N'Q,N'(1-Q)  be  the  3tatistic  Vc^N'd-Q)  With  thfe  kernel 

*'(X.  •,  Y.  ).  Also  let; 
r  Jr 

C  =  condition  that  N‘(Q-M*P)  <  m’  <  H'(Q-*m‘p),  -jj  >  p  >  0  . 

C'  =  complement  of  C. 


Then,  one  may  write 
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E(¥’Y*)  *  H'lr(b10)S  +  r(b01)V2 
+  N'tr(b10)^  ♦  r(b01)^]'2 

‘  B  (VeC'^^'^VQ^'d-Q) 


m' ec]  f  Pr(m’ eC) 


)  m'eC']}  Pr(m'€C')  . 


Now  notice  that  E(Y'Y*)  is  the  correlation  coefficient  of  Y*Y*  and 
since  both  Y1  and  Y*  are  functions  of  random  variables  Xj^,  . . 
XjtI4mi  ;  Ym+1<  ...,  YM+n,  ,  one  has  1  >  E(Y'Y*)  >  0,  and  for  any  m’, 
n'  >  r. 

Consequently, 


E(y'Y*)  > 


V +  r(Vt]2 


“(VeC^'V^N'd-Q)  m'eC)l 


Pr^eC) 


lr(*10)5  +  r(bni  j*f 


“01' 


E{&N'(Q+e),N,(l-Q-e)&U'Q,N,(l-Cl)] 


♦  0(M'2+^)  , 

by  Lemma  4.2,  where  e  denotes  some  value  in  the  interval 
(-M*p,  M"P).  Notice, 

E^'(Q+e),N'(l-Q-e)  Vq,M'(H)^ 

»  (*'(Ve))‘1(«,(l-«-e))"1(H,Q)‘1(N,(l-Q)j‘1 

•  Y^)*'^  {  *t  )], 
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where  £p  Xy  £4  are  buds  over  all  sets  of  Integers, 

M  +  1  <  ix  <  ...  <  lr  <  N’(Q  +  e ) 

M  +  l<j1<...<Jr<  N’(l  -  Q  -  e) 

M  +  1  <  <  ...  <  ky  <  N'ft 

M  +  1  <  <  ...  <  ty  <  N*(l  -  Q),  respectively. 


The  expectation  of  ♦‘(X^  ;  Y,  )♦* (Sc^  ;  Y^  )  is  zero  when  the 

r  ur  r  r 

sets  (ip . . .,  ir),  (kp...,  ky)  have  no  integer  in  common  and  the 
sets  (Jp...,  jy),  (tp...,  tr)  have  no  integer  in  common.  On  the 
other  hand,  the  expectation  of  it  will  become  bci  if  there  are  c 
common  integers  in  the  former  pair  of  sets  and  d  common  integers 
in  the  latter  pair  of  sets.  Therefore,  the  number  of  sets  having 
(c,  d)  integers  in  common  are:  for  all  e  non- negative, 

(7.2)  (D(”>x",(^)-c)(j)(”,(1;9-5))(,,'(i:5)-4) 


'c''  r  ''  r-c 
and  for  all  e  non-positive. 


(7-3) 


Note  that  for  e  identically  zero,  Y"  =  Y*  . 
Consider  (7-2),  which,  for  N:  large,  is 


(N*  Q)r[llr  (Q+e)-c] 


r-c  1 


c![(r-c)’.]2  d‘.«r-d)!]e 

iN^l-Q-eJ^lN^l-Qj-dV  ^  +  lower  older  terms 


( N*  )1+r'c"d  Qr(Q+e)r’c(l-Q-e)r(l-Q)r“d 


c!d'.[(r-c)'.(r-d)!]2 
+o((N’)4r'C-d]  • 


■  (7.4) 


5^ 


On  the  other  hand,  the  coefficient  before  the  summation  sign  is 

=  (N,)'lfr(l-Q-e)'r(<i+e)'^(l-Q)■rQ"^(r,.)1|  +  ot(N*  )_lfr]  . 
Combining  (7A),  (7*5)  and  the  above  discussion,  one  has 

^^’(Q+eJjN'Cl-Q-e)  Vq,N'(H)^ 

=  I  Z  [ - - - w  (K,)^r'C"dQr(l-Q)r'd 

c=0  d=0  c!d!l(r-c)!(r-d)!r 

(c,d)  f  (0,0) 

(Q+e)r_C(l-Q-e)r  +  o(K,1+r‘c-d)] 

•  [(N^^l-Q-e^Q+ifd-QjVtr!)’4  +  o(n'  )4r] 

-  2  2  (r*.)\ d//(N‘)C+d(Q+i)C(l-Q)dc!d!£(r-c)'.(r.d)«J2T 

c=0  d=0  l  J 

(c,d)  f  (0,0) 

It  is  seen  that  for  smaller  c  +  d,  the  term  is  large,  and  as  c  +  d 
becomes  large  the  term  becomes  small  in  order  of  magnitude.  Re¬ 
taining  the  largest  order  of  magnitude  terms  (c,  d)  =  (l,  0)  and 
(c,  d)  =  (0,  l),  one  has, 

E^*(Q+e),N'(l-Q-e)  ^N'Q^^l-Q)^ 

=  r2b10N,'1(Q+i)'1  +  r^b01N,"1(l-Q)"1 

=  rS^N*  r  VX[1  +  e/Q]'1  +  AojlJr'U-Q)!’1  +  o(N,_:L) 

-  ^(hTV1!!  +  0(e)]  +  AQlIl,(H)]*1  +  o(N,_1) 

-  (H'r^^b^)^  +  r(b1Q)^]2  +  0(H''Vp), 

where  e  — >  0  slowest  for  e  near  M*p  . 
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With  essentially  similar  steps,  one  vill  find  the  same  result 
if  e  is  non- positive.  Thus 


1  >  E(YY*)  > - - rg]  ' 

Irtb^  +  r  (b01)*]SL 


r[r(bln)i  .  r(1>m)ij2 


"10 '  ■  *v“01' 

tT- . 


+  0(H''VP)}  +0(M"2+4p+P) 

or  1  >  E(YY*)  >  1  +  0(M"P)  +  O(M_2+1*'p'f0)  . 


Therefore, 

Limit  E(y'  -  Y*)2 
N'-->  oo 


Limit  [2  -  2  +  0(lfp/f)) 
N* — >  oo 


Theorem  7.1  is  proved. 

Corollary  7.1  Y1  is  also  asymptotically  normally  distributed 
with  mean  zero  and  variance  one,  or,  U'  is  asymptotically  normally 
distributed. 

Next,  in  the  expression  for  Y*  defined  above,  if  one  substi¬ 
tutes  the  estimated  variance  in  terms  of  the  values  of  and  TQ1, 
in  the  variance  of  U*  (in  terms  of  the  vaT  of  b^0  and  bQ1), 
since  T10  and  TQ1  are  efficient  estimators  of  b1Q  and  bQ1,  the 

resulting  standardized  random  variable  Y'  is  also  asymptotically 

s 

normally  distributed  vith  mean  zero  and  variance  one. 

Theorem  7.2  Y^  ■  (U*  -  ®^fo,-^tr(T10  £  +  r(T01$)}  is 
asymptotically  equivalent  to  Y* . 

Proof;  It  suffices  to  show  that  N1"^  (r(T1Q)^  +  r(TQ1)^)  is 
asymptotically  equivalent  to  N1"^  (r(b1Q)^  +  r(bQ1)^)  (see  (6], 
Theorem  3  and  applications),  i.e., 
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P-llait  [N’"*  (r(T10)*  +  r(T01)^)  -  N*'*  (r(b10)Kr(b01)^)]  -  0. 

In  other  words,  it  is  equivalent  to  show  that  for  any  e'  >  0, 

Limit  Pr  I  |r(T10)5  +  r(TQ1)^  -  r(b1Q)^  -  r(bQ1)^|  >  e*N'^}  «  0. 
N* — ■>  oo  l  w 

Now, 

Pr  ||r(T10)*  +  r(T01)^  -  r(b1Q)*  -  r(b01)*|  >  eV*} 

<  Pr||r(T10)^  -  r(b1Q)*|  +  |r(TQ1)*  -  r(b01)*j  >  «'»•*) 

(7-6) 

<  Pr[2|r(T10)^  _  r(b1Q)^|  >  e’N^J 

+  Pr[2|r(T01.)^  -  r(bQ1)ij  >  e'N'^l  - 
Using  Tchebycheff *s  inequality, 

.  *  {*M V*  -  r<l,io)ij}  2  E  {2M  V4  -  r('“oi>41 )  2 

HV2  KV2 


Applying  the  identity  (see  (l)  p.  553), 
i  A  a  •  i  (»  -  if 

a  ■*  •  ssj*  • 


one  has 


E{2tr(Tlo)4  •  r(b10)4)) 


f' 


-a  IVVf-lglV’- 

“  1|V(Tio>*t<bio)41 


(T. 


10 


V 


+  2 - r -  4 


<  r2  |i 


^T10‘b10^ 

*10 


+  2  EC 


-^Tio~bio^- 


-} 


.]  +  E 


(Tio"biol 


10 


}• 


10 
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By  Lemma  4.2,  and.  by  the  fact  that  it  is  also  easy  to  extend  it  to 

gU  1  mu 

the  case  for  odd  moments  of  a  U-statistic,  i.e.,  E(U-0)  ■  0(M  ), 


one  has 


(tiq  -  \Qf  , 

E(-3f - 12-)  .  o(M"1) 

D10 


(T.n  -  b  Y 

E(  102  10 -)  -  0{M'2) 

4 


(T10  -  \0)  .2 

E(-~ - — )  =  0(M  2) 

*10 


Consequently, 

ry 

E{2[r(T  )*  -  r(b  )*}"  -1 

(7-7)  — = - — - * - — — —  -  0(N‘Me'2) 

NV 


with  suitable  choice  of  e  ,  as  N  — >  oo . 

With  exactly  the  same  kind  of  argument,  one  has 


(7-8) 


B{2[r(T0J*  -  r(b  )*]} 


NV2 


0  as  N1  - >  oo  , 


Combining  (7<7)>  (7-8)  and  putting  them  into  (7 .6),  one  has, 
Limit  Pr[|r(T.  J*  +  r(Tm  )*  -  r(b.  J*  -  r(b„  )*|  >  e*N'*] 


N’— >  oo 


<  Limit  0(N,Me,2)_1  -  0  . 
”  N‘— >  oo 


This  proves  the  asymptotic  equivalence  of  N*’^[r(T10)^  +  r(TQ1)^] 
and  N,^Ir(b10)^  +  r(bQi )^J  and  hence  the  asymptotic  equivalence  of 


Yl  to  Y* . 
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Corollary  7-2  is  asymptotically  normally  distributed, 

8.  Extension  of  the  Two- stage  Technique  to  Functions  of  More  Than 
Two  Populations 

So  far,  the  problem  of  two-stage  estimation  has  been  studied 
with  estimable  functions  (or  parameters)  of  two  populations . 

There  is  apparently  a  possibility  of  extending  it  to  functions 
of  more  than  two  populations. 

Let  X^,  . X^k/  be  k  populations  (k  >  2)  with  cumulative 
distribution  functions  F^X),  . ..,  F^(X)  respectively.  Also  let 
9  *  •••>  Fk)  be  the  functional  to  be  estimated,  with  symmetric 

kernel  $(x^,  . ..,  X^)  where  X^  represent  vectors  of  dimension 
r,  i.e.,  X^  are  r  independent  observations  on  population  X^\ 

The  corresponding  U-statistic  with  n^  observations  on  will  be 


where  is  the  k-fold  summation  over  a  set  of  integers  such  that 

for  each  vector  of  integers,  i  ■  1,  2,  .  k,  1  <  J  <••;.< 

j.  <  n. .  Analogously,  define 
r  ~ 

-  ♦(x^1V>«;^kb  -  e 

♦  '  (x(l>;..,;x(k))  =  E*«(x(l)  X(l);...;x(k)  £{k)), 

V  *",ak 

As) 


where  x  represent  vectors  of  dimension  Sj,  Sj  =0,  1,  ...,  r 
for  all  J  ■  1,  2,  . ...  k;  X^  represent  vectors  of  dimension 
**  -  (r  -  ftj)  *=  ( ft j )  for  &11  J  =  2^  ■>  •  *  j  fc* 
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for  vectors  f(j)  of  dimension  a^,  aj  ■  0,  1,  . ..,  r  for  all 
J  *  1,  •  • « ,  k » 

Then  vhen  the  variance  of  ♦'  exists,  one  can  write 
k  n.  -1  r 


k  n.  -l  r  r  „  n,-r  _  n. -r 

Var(uJ  =  [  0  (  *)]  £  ...  £  (*  )C.  )...(!  )(_„  )b 

1=1  a1=0  a^1 


k  'r  . V'^Vl-'k 


Nov,  vhen  a  fixed  total  sample  size  N  is  given  £  n ,  *  N,  and  if 

i-1  x 

n^  — >  oo  in  such  a  way  that  n^/n^  are  bounded  sway  from  zero  and 
one,  for  all  i  ^  J,  i,  j  «  1,  2,  ...,  k.  Then  the  asymptotic  ex¬ 
pression  for  Var(l^)  is 

k  2 
2 

i=l  Gi 

U) . 


Var(u.  )  <  £  b^  »  v!  ,  say 

*  a  — 1  A. 

,  with  one  of  the  subscript  1  at  the 


where  bv  '  ■»  b 
,th 


•  1>  •  -  -  :■  0,  ■ 

i"“  position  and  zero  elsewhere. 

Analogous  to  the  case  of  k  ■  2,  it  is  easy  to  show  that  V1  is 
minimized  when 


n  =  (b^jfy  £  (b(l))* 

1  i=l 


and  the  minimi. zed  value  of  V1  is  V, 

VQ  =  N"1!  £  rO>^jM 
i=l 


0* 


when  b^,  i  »  1,  2,  , k,  are  all  known. 

The  two-stage  estimating  procedure  will  be  as  follows: 

(a)  Take  M  observations  on  each  of  the  X^,  i  »  1,  2,  ...,  k. 


where  2rk  <  M4  <  N-rk  . 


(b)  Estimate  the  k  unknowns  i  »  1,  2,  . ..,  k>  by,  say, 

TU), 

(c)  Take  more  observations  on  X^,  i  *  1,  2,  . ..,  k,  such 

that 

m.  »  h'(T^)*/  I  (T^)^  where  N1  »  N  -  kM 
1  i-1 

for  estimator  using  only  the  second  stage  observations. 

(d)  Use  u£,  the  analogous  two- stage  estimator  of  U',  to 
estimate  0. 

It  can  be  shown,  using  essentially  the  same  arguments  and 
under  the  same  kind  of  conditions  as  in  Theorem  4.1,  but  replacing 
the  condition  (ill)  by  (ill'): 

>  0,  i  -  1,  2,  . . .,  k  ; 

that  the  analogous  result  can  be  obtained.  Also,  the  asysrptotic 
distribution  of  is  again  normal. 

9-  Summary  of  Results  on  U*1 

In  this  section,  results  on  u"  will  be  summarized  without 
proof,  u"  is  introduced  in  order  to  utilize  the  data  from  the 
first-stage  samples  as  well  as  the  second-stage  samples.  (Recall 
that  U1  is  constructed  based  on  the  second-stage  samples  only.) 

The  first  stage  for  u"  is  defined  exactly  the  same  as  for  U* 
(see  Section  3).  However,  in  the  second  stage,  m"  (not  m* )  more 
observations  from  population  X  and  n"  (not  n' )  more  observations 
from  population  Y  will  be  taken  such  that  m”  +  n"  ■  H1  ■  H  -  2M. 
The  sample  sizes  m"  and  n"  are  determined  as  the  following: 


41 


n"  -  [HZ]  -  M  when  (M  +  l)/N  <  2  <  (N-M)/N 

(9*1)  a"  -  0  when  Z  <  (M  +  l)/N 

n*'  -  N'  when  Z  >  (N  -  M)/N 

and  n”  ■  N*  -  n"  . 

The  statistic  u”  is  defined  as  the  estimator  of  0  based  on 
M  +  m"  and  M  *  n"  observations  on  populations  X  and  Y  respectively. 
The  definition  is: 

(9-2)  if  -  L  ♦(Xlyj  Y^) 

where  the  summation  is  taken  over  sets  of  all  integers, 

1  <  ^  <  ...  <  ir  <  M  +  a“;  1  <  <  ...  <  3r  <  M  +  n". 

The  statistic  U*'  is  biased.  However,  by  the  help  of  two 
Lemmas,  the  following  theorem  is  proved. 

Theorem  9.1  If  the  conditions  of  Theorem  4.1  are  satisfied, 
then  E(u")  -  0  +  0(M"1'*ep  +  M^~P),  and  that 

Limit  E(U"  -  0  )2/Vn  -  1  . 

N— >  oo 

Next,  it  is  found  that  there  is  no  "optimal*1  choice  of  M 
relative  to  N  so  that  the  ratio  E(U*'  -  0  )^/VQ  converges  to  unity 
as  quickly  as  possible.  However,  it  is  also  found  that  in  any 
case,  this  ratio  converges  to  unity  not  slower  than  the  ratio 
Var(u*  )/VQ,  if  the  same  set  of  values  of  0  and  p  cure  used. 

Again,  although  U*'  is  biased,  it  can  be  shown  that,  using 
similar  techniques  as  for  the  case  of  U*,  U*'  is  also  asymptotically 
normally  distributed,  either  in  terms  of  its  variance  or  in  terms 


1*2 


of  the  estimated,  variance,  i.e.,  with  b.^  (b^)  replaced  by  T^q 
( Tc , ) .  Finally,  using  analogous  steps  as  for  U1,  +he  results  on 
U“  are  generalized  to  cover  the  situation  of  sampling  from  several 


populations . 
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